-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tabular Q solution example #125
Tabular Q solution example #125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really nice @JD-ETH, thanks a lot! Having a value-iteration baseline is a very welcome addition.
In general the implementation is solid, but since you asked, I have gone through and left a very pedantic review 😃 Don't worry, this level of scrutiny is not normal! Most things are small nitpicks or bikeshedding. The most important change I'd love to see is to bring the level of commenting up to the point of this being a reasonably-standalone explanation of the technique for a general non-expert audience.
Leave a comment here once you want me to take another look.
Cheers,
Chris
Oo one thing I forgot - please add a test 😊 Given that this isn't core API, a simple smoke test using flags for quick run like You will then need to add bazel definitions so that the test is run on EDIT: Also worth noting that this test will fail until this PR is rebased on #126. |
@ChrisCummins I'm ready for the next round. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @JD-ETH, thanks a lot for making those changes, I really appreciate the extra docs and the test!
This LGTM and feel free to merge it when you're ready. You may want to squash the history before merging into smaller atomic commits, perhaps one for the brute_force fix and the other for the tabulary_q, but that's not essential so I'll leave it up to your discretion :) Thanks for adding this new baseline!
Cheers,
Chris
Co-authored-by: Chris Cummins <chrisc.101@gmail.com>
@ChrisCummins Now i see CI is failing from this PR, sorry for merging it before the CI tests finished. I am not sure I understand what failed in the test though, would you give some pointers? |
Thanks for checking. It's a false positive, nothing to worry about :). See #144. |
@ChrisCummins
A working example with tabular Q-learning.
As this is my first real PR please be harsh about formatting, logging, commenting, styling, and point me to guidelines that I have missed. Thanks!